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Abstract 

This paper investigates trellis structures of linear block codes for the IC (integrated 
circuit) implementation of Viterbi decoders capable of achieving high decoding speed while 
satisfying a constraint on the structural complexity of the trellis in terms of the maximum 
number of states at any particular depth. Only uniform sectionalizations of the code trellis 
diagram are considered. An upper bound on the number of parallel and structurally 
identical (or isomorphic) subtrellises in a proper trellis for a code without exceeding 
the maximum state complexity of the minimal trellis of the code is first derived. Parallel 
structures of trellises with various section lengths for binary BCH and Reed- Mu Her (RM) 
codes of lengths 32 and 64 are analyzed. Next, the complexity of IC implementation of a 
Viterbi decoder based on an L-section trellis diagram for a code is investigated. A structural 
property of a Viterbi decoder called ACS-connectivity which is related to state connec- 
tivity is introduced. This parameter afFects the complexity of wire-routing (interconnections 
within the IC). The effect of five parameters namely: (1) effective computational complexity; 
(2) complexity of the ACS-circuit; (3) traceback complexity; (4) ACS-connectivity; and (5) 
branch complexity of a trellis diagram on the VLSI complexity of a Viterbi decoder is inves- 
tigated. It is shown that an IC implementation of a Viterbi decoder based on a non-mimmal 
trellis requires less area and is capable of operation at higher speed than one based on the 
minimal trellis when the commonly used ACS-array architecture is considered. 



1. Introduction 

Any linear block code can theoretically be decoded by applying the Viterbi algorithm 
to a trellis for the code. Trellises for block codes were first described in [l]-[3]. After Forney’s 
refinement of the structure of these trellises [4], their potential in the practical decoding 
of block codes has been realized by many others who have published extensively on various 
aspects of the trellis structure of block codes [5]- [25] . In some of the above papers, one goal 
was to minimize the maximum number of states in the trellis at any depth by considering 
all possible permutations of the code [6], For some codes such as Reed-Muller codes, this 
optimum permutation is known [7]. For most others only bounds are known. 

Even when the optimum order of bits is known or a good permutation is known (if the 
optimum order is unknown), previous work has focussed on minimization of the number of 
computations required for decoding [12, 22, 24], If the actual decoding is intended to be 
performed using a stored program approach that executes the operations needed to decode 
a received vector sequentially, then this approach will lead to the fastest decoding speed. 
However, if an IC implementation is intended, then an alternative approach is more suit- 
able. Given a constraint on the amount of hardware (determined by the number of states 
and the complexity of branches) in the decoder, decoding must be done as fast as pos- 
sible; not necessarily with as few computations as possible. To achieve this end, we 
propose the use of non-minimal trellises with parallel structure in which the maximum state 
space dimension is not greater than the maximum state space dimension of the minimal 
trellis of a code. In this paper, certain properties concerning the state connectivity and 
branch complexity [9] of this non-minimal trellis are derived which demonstrate that the 
non-minimal trellis implementation would require less area in an IC implementation 
than the corresponding minimal trellis when the ubiquitous ACS array architecture [26]-[28] 
is used for implementation. We caution that if a different architecture as proposed in [27] 
or [24] is chosen for implementation, then the trellis structure that is best suited will in 
general be different from the proposed trellis. 

The number of decoding operations required by the standard trellis-based Viterbi decod- 
ing algorithm depends on the sectionalization of the trellis used for decoding. Most of the 
previous works focussed on uniform sectionalization of a trellis, each section consists of the 
same number of code symbols. However, Lafourcade and Vardy [22] recently showed that 
non-uniform sectionalization of a trellis often results in less number of decoding operations 
than uniform sectionalization. They have devised an efficient algorithm for finding optimal 
sectionalization of a trellis for minimizing the total number of decoding operations required 
for maximum-likelihood trellis decoding. Optimal sectionalization of a trellis to minimize 
computational complexity is also investigated in [24]. In this paper, we only investigate 



good trellises with uniform sectionalization for IC implementation of Viterbi decoders. Par- 
ticularly, we are concerned with those structures, such as parallel structure, regularity and 
state-connectivity that: (1) affects the complexity of wire-routing (interconnections) within 
the IC and chip-size; and (2) facilitate parallel and pipeline decoding process to achieve high 
decoding speed. Since non-uniform sectionalization of a trellis requires less decoding opera- 
tions, this advantage over uniform sectionalization and other properties definitely should be 
investigated for IC implementation of Viterbi decoders to achieve high decoding speed. This 
investigation is beyond the scope of this paper. 

Trellises for block codes are often loosely connected. A properly constructed trellis may 
consist of many parallel and structurally identical (isomorphic) subtrellises of smaller state 
space dimension without cross-connections between them. Consequently, identical Viterbi 
decoders of much smaller complexity can be devised to process the subtrellises independently 
in parallel without internal communication between them. This not only simplifies the 
IC implementation but also speeds up the decoding process. For example, the (32,16,8) 
Reed-Muller (RM) code, also an extended BCH code, has a 4-section, 64-state minimal 
trellis diagram, which consists of eight parallel and structurally identical 8-state subtrellises 
without cross-connections among them. As a result, we can devise eight identical 8-state 
Viterbi decoders to process the 8 subtrellises in parallel without communication between 
them. At the end, there are 8 survivors (one from each subtrellis) and the best one will 
be chosen as the decoded codeword. This reduces the implementation of a 64-state Viterbi 
decoder to the implementation of an 8-state decoder and using 8 copies of it. This parallel 
structure reduces the wire-routing and internal communications within IC which reduces chip 
size and improves decoding speed. If the state and branch complexities of each subtrellis 
is small and the total number of subtrellises is small, all the subtrellis decoders can be put 
on a single chip, such as for the (32,16,8) RM code [29]. However, if the state and branch 
complexities are big, then each subtrellis decoder (or several of them) can be implemented 
on a single chip. This provides flexibility in chip plan and decoder architecture. 

The two fundamental bottlenecks to Viterbi decoding (decoding speed) are the inter- 
nal communications between ACS (add-compare-select) units and comparisons of incoming 
branches (radix-profile) at each state [28, 30]. Properly designed parallel structure in a trel- 
lis would overcome these obstacles without exceeding the maximum state space dimension 
of the minimal trellis. For example, a (64,40,8) RM subcode which is being considered by 
NASA for high-speed satellite communications has an 8-section 2048 state trellis. This trellis 
consists of 32 parallel and structurally identical 64-state subtrellises. The last 4 sections of 
each subtrellis are a mirror image of the first 4 sections as shown in Figure 3. As a result, 
a bidirectional decoding can be performed. Furthermore, the maximum component of the 
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radix profile for each half subtrellis is only 8. A 64-state subtrellis decoder can be imple- 
mented on a single chip in 0.5 micron CMOS technology which can operate at a decoding 
speed of 600 Mbps [31]. Other structural properties of the subtrellises for this (64,40,8) RM 
subcode which simplifies the IC implementation will be discussed later. Parallel structure 
therefore, offers simplification, flexibility and higher decoding speed for IC implementation. 
We must note that the parallel structure does not reduce the total number of single-state 
processors, i.e., number of ACSs. 

In this paper, we investigate trellis structures, particularly the parallel structure, of 
linear block codes for implementation of Viterbi decoders capable of achieving high decoding 
speed while satisfying a constraint on the structural complexity of the trellis in terms of the 
maximum number of states at any depth. Only uniform sectionalizations of the code trellis 
are considered. The organization of the paper is as follows. 

In Section 2, using the theory of L-section minimal trellis diagrams, an upper bound 
on the number of parallel isomorphic subtrellises in a proper trellis for a code without 
exceeding the maximum state space dimension of the minimal trellis of the code is derived. 
In Section 3, we analyze the trellises for all extended BCH and RM codes of lengths 32 and 
64. In Section 4, we define parameters related to the complexity of a Viterbi decoder IC 
using the ACS-array architecture for linear block codes. Section 5 treats examples and in 
Section 6 we use the results of this paper to design a trellis for a (64,40) RM subcode. 

2. Trellises with Parallel Structure for Linear Block 
Codes with Constraint on Maximum State Space 
Dimension 

The objective of this section is to show that we can build a trellis for a linear block code 
C which is a disjoint union of a certain desired number of parallel isomorphic subtrellises. 
Although this trellis is not minimal, its state space dimension at every depth is less than or 
equal to the maximum state space dimension of the minimal trellis. The conditions under 
which such a trellis construction is possible and an upper bound on the number of such 
parallel subtrellises are derived. In some cases, the minimal trellis itself possesses a parallel 
structure. The number of such parallel subtrellises (if any) in the minimal trellis is derived. 

2.1. Preliminaries 

We consider only binary (N, K, d^) linear block codes. Let L, M be positive integers 
such that L M = N. The minimal (up to graph isomorphism) L-section trellis, is a well 
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understood graphical representation of the code [5, 9]. Let the sets of states at the end of each 
section be denoted {S 0 , S\f, S 2 M , .... S(L-i)M * Slm}- We define a sequence {s 0 ,sm, . • . , si,a/} 
called the state complexity profile (SCP) of the trellis and given by s tl \f — log 2 (|5nv/|) for 
0 < i < L. The minimal L-section trellis of a code C has the property that every component 
of its SCP is less than or equal to the corresponding component in the SCP of any other 
proper L-section trellis for C. The maximum among the N + 1 components in the SCP of the 
minimal N- section trellis ( L = N,M = 1) for C is denoted s max (C) and we will denote the 
maximum of the components in the SCP of the minimal L-section trellis for C as <s maXi L(C). 
For a binary /V-tuple v = (iq, . . . , v/v), let Pk,k'[ v ] denote the {hf — h )- tuple ( 2 ^+ 1 , • . * , v v) 
and let Ph ( v[C] = {p^/t'fc] : c G C}. Let C^k 1 be the linear subcode of C consisting of 
all codewords whose components are all zero except for the [h f — h ) components from the 
[h + l)-th bit position to the h f - th bit position. 

In an L-section minimal trellis for a block code, there may be a set of parallel branches 
between two adjacent states. In such a case, we call the entire set of parallel branches a 
composite branch. Each composite branch in the i- th section 1 < 1 < T, is made up of 2 P ' 
parallel branches where P x is the dimension of the subcode denoted )m,iM [9]. In the 
Tth section of an L-section trellis for a linear block code 1 < 2 < T, the number of distinct 
branch metrics that have to be computed is 2 D ' where D x is the dimension of the subcode 
P(i—\)M,iM ( C ) and this number is much less than the total number of branches. D x is the 
rank of the submatrix formed by M columns from the ((2 — 1 )M + 1 )- th to the (iM)-th 
column of the generator matrix of the code and is upper bounded by M . For 1 < i < L, let 
the number of composite branches merging into any state s G S ti \r be 2 S ' M (it is the same for 
any state in SW). For an T-section trellis for C, we define the converging branch profile 
(CBP) as the ordered sequence {6 m? • • • ? For 0 < i < L ) let the number of 

composite branches emanating from any state 5 € be 2 AlM , (it is the same for any state 
in S\m)- The ordered sequence (A 0 , !)m} is called diverging branch profile 

(DBP). Then 8 and A are related as follows: 

8>iM = S ( 2 . 1 ) 

Based on the theory of T-section trellises [9], it can be shown that 

A(,-i)/v/ = dim(C(,_ 1 )A/,yv) — dim(C,Afjv) — P, (2.2) 

which implies that j equals the number of rows of a trellis oriented generator matrix 
of C whose leading 1 occurs among the positions {(f — l)M, (i — 1 )M + 1 — 1} 

and whose span is not contained in the i- th section. These dimensions can be easily deter- 
mined from the trellis oriented generator matrix of the code [9, 16, 20]. The two sequences, 
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{ 6 m , 62 m 6 lm } and { Aq, Aj , . . . , provides a measure of the state connectivity 

of an L-section minimal trellis. In IC implementation of a Viterbi decoder, S iM is called a 
radix number. 

2.2. Parallel Trellises 

Let G be the trellis oriented generator matrix of an (N,I<) linear block code C [4]. Let 
r = ( r i> r 2 , ■ ■ ■ , r/v) be a typical row of G. Then, we define the span of r, denoted span{ r), 
to be the smallest interval [*, j], 1 < i < j < N which contains all the non-zero elements of 
r. For a row r whose span is [i,jj we also define an active span of r, denoted aspan(r), 
35 i l 1 J ~ 1] 1 < j and aspan(r) = <f> if i = j. The trellis oriented matrix has the following 

properties: (1) The leading 1 of every row occurs in an earlier position than the leading 1 
of the row below it; (2) The trailing 1 of every row occurs at a different position from the 
trailing 1 of every other row. Any other trellis oriented matrix for C has the same set of row 
spans although the rows themselves may be different [20]. Let T be the minimal ^-section 
trellis for C. Given the trellis oriented generator matrix of a code, the state space dimension 
at any position / is just equal to the number of rows whose active span contain / [20]. For 
example, consider the following trellis oriented generator matrix: 

1 1 1 1 0 0 0 0 n \ 

0 1 0 1 1 0 1 0 v 2 

0 0 1 1 1 1 0 0 r 3 

\ 0 0 0 0 1 1 1 1 r 4 j 

for which aspan(ri) = [1,3], aspan{ r 2 ) = [2,6], aspan{ r 3 ) = [3,5] and aspan{ r 4 ) = [5,7], 
For each /, 0 < / < 8, counting the number of rows which are active at that / yields the 
state dimension profile (SCP), (0, 1, 2, 3, 2, 3, 2, 1, 0}. For 0 < / < N, let s,(C) denote the 
dimension of the /-th state space of C. Let s max (C) be the maximum among the state space 
dimensions. Define the non-empty set, 

4nax(C) = {/ : s/(C) = «s max (C)}. (2.3) 

Suppose we choose a subcode C' of C such that dim(C') = dim(C) - 1 and the set of coset 
representatives [C/C ] is generated by the single row r G G. From the above statement 
about <s<(C), it is clear that s/(C') = S;(C) — 1 for exactly those / where r is active, i.e., 

/ G a$pan( r). For other positions / ^ aspan(r) we have s/(C / ) = s/(C). Hence we have the 
following proposition. 

Lemma 1: If there exists a row r in the trellis oriented generator matrix G for the code C 
such that aspan(r) D / max (C), then we can form a subcode C y of C generated by G — {r} 
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such that <s max (C') = s max (C) - 1 and / max (C / ) D I m ax (C). I 


In fact / ma x(C / ) — / m ax(C) U {/ : s/(C) = 5 max (C) — 1,/ 0 aspan(r)}. Since G is a trellis 
oriented generator matrix, G' = G — {r} is also trellis oriented. We can apply the above 
proposition again to C f if there exists a row r' G G 1 with aspan(r') D I mAX (C f ). This yields 
a subcode C with dimension smaller by one and s max (C) = *s m ax(C / ) — 1. If no such row 
r' exists, the proposition cannot be applied and the recursion stops. The above proposition 
can be generalized. 

Let R( C) be the following subset of rows of G, 

R(C) = {r G G : aspan{r) D / max (C)}. (2.4) 

Let p — |/2(C) | where \Q\ denotes the cardinality of any finite set Q. 

Theorem 1: With /2(C) defined as above and p — |/2(C)|, let 1 < p' < p. There exists 
a subcode C' of C such that 5 max (C / ) = <s max (C) — p' and dim(C / ) = dim(C) — p' if and 
only if there exists a subset R' C R(C) consisting of p' rows of /2(C) such that for every 
/ satisfying sj(C) > s Tnax (C / ), there exist at least s/(C) — s mAX (C') rows in R f whose active 
spans contain /. The set of coset representatives [C/C 7 ] is generated by R' . 

Proof: Suppose R f = {r'i, . . . , r' /} satisfies the conditions in the hypothesis. Since R' C 
/2(C), / m ax(C) C aspan{v' x ) for 1 < i < p'. Consider the subcode generated by G — R* . For 
those / G / max(C), we can determine $i(C') by counting the number of rows r G (G — R') 
that are active at the position /. But this number is exactly less than s max (C) by p'. For 
l 0 /max(C) and satisfying Si( C) > s max (C'), we are assured by the hypothesis that $/(C) 
will be reduced by at least s/(C) — 5 max (C / ) thus guaranteeing that 5 max (C / ) = s max (C) — p'. 

To prove the converse, let C ( be a subcode of C whose dimension is dim(C) — p' and 
satisfying ^ max (C / ) = <s max (C) — p'. Without loss of generality, we may let C' be generated 
G — R! for some subset R! of the trellis oriented generator matrix G of C with |/2'| = p'. 
Let T be the minimal trellis corresponding to G. Let T f be the minimal trellis for CL Let 
Ni(R') be the number of rows r ; in R* such that / G aspan(r'). Then, at every position 
/, 0 < / < iV, we have 

stif) = stiC) + Nt(R') > s t (C) (2.5) 

since s/(C) is the smallest possible state space dimension. Therefore 

> ^(C)-^(C'), 

5/(C) — s max (C'). 


Ni(R') 

Nt(R') 


> 


(2.6) 



For every /, at least 5;(C) •s max (C ) rows of R' are active. Also, for every / E / m ax(C), we 

have Ni(R ) > s maK (C) — s max (C ) = r' . So all the rows r / E R! satisfy aspan(r') D / ma x(C). 
Thus R' C R(C). I 

The utility of the above theorem is that it shows how to choose a subcode C of C 
with ■s rnax (C ) = 5 max (C) — dim([C/C / ]), such that one can build a non-minimal trellis T 
for C with the following properties: 

1. The maximum state space dimension of T is s max (C). 

n i-r- • .1 . _ _dim [C/C^l 

— 1 is the union of 2 l J parallel isomorphic subtrellises T, with each T t being 
isomorphic to the minimal trellis for C . 

3. Upper Bound on Parallelism: The smallest such subcode has dimension lower 
bounded by dim (C) — |/?(C)|. i.e., the maximum number of parallel subtrellises one can 
obtain with the constraint that the total state space dimension never exceeds •s max (C) 
is upper bounded by 2 |R(C >I with R(C) as defined above. 

Parallelism of the Minimal Trellis: The logarithm to the base 2 of the number of 
parallel isomorphic subtrellises in a minimal T-section trellis for a binary {N,I<) linear 
block code is given by the number of rows in its trellis oriented generator matrix whose 
active span contains the integers {M, 2JVf, « • » , (L — l)Af } where N = LM 

As an example, consider the extended and permuted (32, 21, 6) BCH code. A parity check 
matrix for this code with an optimum order of bits with respect to trellis state complexity 
is shown in Figure 1. The set of spans of any trellis oriented generator matrix for this 
code is given in Table 1. The 4-section minimal trellis has the SCP {0,7,9, 7,0} giving 
•s max , 4 (C) = 9. This trellis has 2 parallel isomorphic subtrellises. / max (C) = {16} and it 
can be verified that |7?(C)| = 9. In an attempt to build a trellis consisting of 64 parallel 
subtrellises while satisfying the upper bound of 9 on the maximum state space complexity, 
we let p' = 6. So s max (C) - p' = s max (C') = 3. The set {/ : s,(C) > s max (C')} = {8, 16, 24}. 
However, we find that no subset R' of R( C) exists satisfying the conditions in Theorem 1. 
Hence we cannot build a trellis consisting of 64 parallel subtrellises for this code without 
violating the constraint on the maximum state space dimension. If we choose p' = 5, then 
we can find a subset R! = {r 6 , r 7 , r 8 , r 12 , r 15 } C R(C) that satisfies all the conditions in 
Theorem 1. Hence choosing the subcode C' generated by G - R' we obtain a trellis T for C 
consisting of 32 parallel isomorphic subtrellises. Each subtrellis is isomorphic to the minimal 
trellis for C which has s ma , x (C') = 4. 

For the same code, the 32-section minimal trellis has the SCP that gives s max , 32 (C) = 10 
and / max (C) = {12,14,18,20}. Using Table 1, we find that |i?(C)| = 2. In an attempt to 
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build a trellis consisting of 4 parallel subtrellises while satisfying the upper bound of 10 on 
maximum state space dimension, we let p' = 2. So 3 max (C / ) = 8. The set {/ : s;(C) > 
5 ma x(C')} = {10,11,12,13,14,15,16,17,18,19,20,21,22}. We find that the subset of two 
rows having span [8,25] and [10,23] satisfy the conditions in Theorem 1. 

This decomposition of a trellis into parallel and structurally identical subtrellises of 
smaller state complexity without cross-connections between them has significant advantages 
for IC implementation of Viterbi decoding. Identical Viterbi decoders of much simpler com- 
plexity can be devised to process the subtrellises independently in parallel without internal 
communications (or information transfer) between them. Internal information transfer limits 
the decoding speed [28, 30]. Furthermore, the number of computations to be carried out per 
subtrellis is much smaller than that of a fully connected trellis. As a result, the parallel struc- 
ture not only simplifies the decoding complexity but also speeds up the decoding process. 
For example, the (32, 16, S) extended and permuted BCH code (also a Reed-Muller code) has 
a 4-section trellis diagram of 64 states. It can be decomposed into 8 parallel and structurally 
identical 8-state subtrellises without cross-connections between them as shown in Figure 2. 
As a result, 8 identical S-state Viterbi decoders can be devised to process the decoding in 
parallel. An IC implementation of a Viterbi decoder for this code using a 0.8 micron CMOS 
technology has been recently completed at the University of Hawaii VLSI Design Center. 
The decoder is implemented in Xilinx Field Programmable Gate Array (FPGA) chips [29]. 
The decoder is capable of operating at a speed of 200 Mbps. Custom design of this decoder 
using 0.5 micron CMOS technology can achieve a decoding speed of 600 Mbps or higher. 

3. Trellises of BCH and RM Codes of Lengths 32, 64 

Based on the theory developed in the previous section, an analysis of the parallel struc- 
ture of the trellises for Reed-Muller (RM) and extended binary BCH codes was carried out. 
The degree of parallelism and the state complexity both depend on the sectionalization of the 
trellis. In general, it is known that as the number of sections decreases, the state complexity 
also decreases but the branch complexity in each section increases. We consider all possible 
uniform sectionalizations in which the number of parallel branches between two connected 
states is at most two. For example, we consider only 64-, 32-, 16- and 8-section trellises 
for the (64,42,8) RM code because the 4-section trellis has 32 parallel branches between 
any two adjacent connected states. The reason is that from an implementation and com- 
putational viewpoint, greater than 2 parallel branches between adjacent connected states 
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are disadvantageous. 2 The results of this analysis are presented in Table 4. Some surpris- 
ing results are observed. A 32-section trellis for the (32,11,12) extended BCH code can be 
constructed which consists of 512 parallel 2-state subtrellises. An 8-section trellis for the 
(64,42,8) RM code can be constructed which consists of 128 parallel 64-state subtrellises. 
These results do not follow from the squaring construction for Reed-Muller codes or other 
previously published approaches. They also provide the designer a wide range of choices 
for trellises from which to choose. In the tables for each code and for each possible choice 
of the number of sections L, the logarithm to base 2 of the maximum number of parallel 
subtrellises that can be obtained without exceeding the number of states in the minimal 
trellis is denoted P m&Xf L. The maximum state space dimension of the T-section subtrellis for 
the subcode C' is denoted . The best known order of bit positions with respect to 

state complexity of BCH codes of length 64 presented in [12, 25] was used to produce the 
tables. 

4. Issues in the IC Implementation of an L-section 
Trellis-Based Viterbi Decoder 

In this section, five key factors affecting the decoding speed of a Viterbi decoder based 
on the minimal and non-minimal trellis are examined. The non-minimal trellis structure 
presented in this paper reduces the internal communication and allows independent parallel 
processing of the subtrellises while decreasing the complexity of a Viterbi decoder IC. We 
substantiate this claim through analysis in the following subsections. 


4*1* Effective Computational Complexity of L-section Trellis 

We consider a Viterbi decoder IC based on an L - section trellis with M bits per section 
for a (L M, A , d mm ) block code C. While many VLSI structures have been described for 
a Viterbi decoder [26, 27, 32], the most widely implemented structure is based on add- 
compare-select circuits (ACS) wherein each abstract state in the trellis diagram manifests 
itself as a physical ACS circuit on the IC and the same ACS’s are repeatedly used for all 
depths in the trellis. The ACS’s can be labeled ACS-z for 0 < i < 

Let 7i be the time required to process section-i of the trellis. At time t = 0, the 
metrics of the ACS circuits corresponding to the originating state of each parallel subtrellis 
are initialized to 0. After 71 units of time, at t =71, the ACS-i corresponding to state s z 

2 When there are exactly two parallel branches with complementary labels, the correlation metric for one 
branch is the negative of other and hence can be obtained by a mere sign inversion 
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at the end of section-1 for 0 < i < |S,v/(C)|, has the metric of state s, G S\f. The index 
of the surviving branch into s,- is also stored in ACS-t. Continuing in this way, at time 
t = + • • • + 7/, 1 < / < L, ACS-i corresponding to state s,, 0 < i < |5(.vr(C)|, will have the 

metric of s, G Sim(C) and a sequence of l survivor branch indices corresponding to the most 
likely path from the originating state (of the subtrellis to which s, belongs) to s, G 5/m- 

There are as many ACS’s as the maximum number of states at any depth in the L-section 
trellis for the linear block code. In the minimal trellis, whenever the decoder is processing 
the trellis at a depth at which the state size is less than the maximum state size, a number 
of ACS circuits are idle and the hardware utilization efficiency is poor. In the non-minimal 
T-section trellis, the utilization of the ACS circuits that exist in the IC is improved. Since 
all the subtrellis decoders operate independently in parallel, from the standpoint of speed, 
the effective computational complexity of decoding a single block (a received vector) 
is defined as the computational complexity of a single parallel subtrellis (viz. the minimal 
trellis for the subcode C') plus the cost of the final comparison among the choices (survivors) 
presented by each of the subtrellises. The time required for the final comparison is small 
relative to the time required for decoding a subtrellis and this comparison can be pipelined. 
Since subtrellises are processed in parallel, the speed of operation is limited only by the time 
required to process a subtrellis. 

Note that both the minimal and non-minimal trellises require the same number of ACS 
circuits. However, the non-minimal trellis has a larger number of parallel subtrellises as 
compared to the minimal trellis (which often has none). Hence decoding using the non- 
minimal trellis with proper structure is faster compared to that using the minimal trellis. 
Therefore, a system bit rate specification which earlier could be met only by the use of some 
P number of Viterbi decoders operating simultaneously in parallel can be met with much 
fewer than P Viterbi decoders. In this manner, the effective computational complexity is a 
factor affecting the reduction in hardware complexity of an overall decoder. 

4.2. Complexity of the ACS circuit 

The converging branch profile (CBP) defined as the number of branches merging into a 
state at each particular depth also affects decoding speed and implementation complexity. 
This is called radix in IC literature. Let <5,m(C), 1 < i < L, be the CBP of the minimal 
trellis for C with trellis oriented generator matrix G. At depth /, 1 < / < L, the ACS 
circuits have to perform at least 6i.u stages of a tree type [33] two-way comparisons to find 
the best incoming branch. Hence reduction of the converging branch profile will improve the 
speed of decoding and reduce the complexity of each ACS circuit. We now show that none 
of the components in the CBP of the non-minimal trellis is increased. As will be shown by 
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examples in Section 5, most of the components of the CBP are decreased considerably. 

Consider a non-minimal trellis for C obtained as the union of 2 parallel subtrellises each 
isomorphic to the minimal trellis for C', a subcode of C generated by G - {r}, r € G. Let 
^»/Vf(C'), 1 < i < L, be the CBP of the minimal trellis for C'. Recall that sj(C') = Si( C) 
if / £ aspan(r) and Si(C') = s;( C) - 1, if / G aspan(r). By equation (2.2), A(,_ 1 ) A /(C') G 
{A(,_i)a/(C), A (i _ 1 )M (C) - 1}. By equation (2.1), S tM {C ) > 8 iM ( C) only if s ( ,_ 1)A/ (C') = 
•S(.-i)m(C) and s,m(C') = s,m(C) — 1. But in this case, (i — 1 )M ^ aspan(r) and iM 6 
aspan(r). So A ( ,_ 1)M (C') = A (i _ 1)M (C) - 1. Therefore, 8, M {C') = S iM ( C). 

4.3. Traceback Complexity 

Consider the problem of traceback to determine the best path through the trellis. In 
the minimal trellis, the ACS-z corresponding to state s,- € Sim has to store 8M C) + P,(C) 
bits in order to identify which of the 2 S,M composite branches merging into s t and which 
of the 2 P ’(^ parallel branches that form a composite branch survives. Therefore, in the 
minimal trellis, each ACS-f needs to store E^i^i/V^C) P t (C)) = dim(C) bits in order to 
identify sequence of surviving incoming branches. In the non-minimal trellis, the storage in 
number of bits required for each ACS-z is 22iLi(8 ,m( C') + P,(C')) = dim(C') where C' is 
the subcode of C corresponding to the subtrellis. Since dim(C') = dim(C) - P max ,i(C), the 
ACS’ ’s in the non-minimal trellis design require less storage than in the minimal trellis. The 
combined savings in storage in all the ACS circuits is significant. 

4.4. ACS-Connectivity 

The basic operations performed by an ACS circuit are: addition of branch metrics of the 
incoming branches to the state metrics of the corresponding originating states, comparison 
of the resulting sums to find the best, selection of the surviving sum as the new state metric 
and the corresponding surviving branch label. The ACS-array architecture is dominated 
by the area required by the interconnections to transfer the state metrics [27]. For a state 
5 «' ^ Sim, 0 < i < 2 St(( ^\ 0 < / < L, let A/(s{) denote the set of states in 5 (/ + i)m that are 
adjacent to s t . Let A;(s,) = <t> if i > |S</v#|. Then in the ACS-array implementation of the 
Viterbi decoder based on the minimal trellis, a path to transfer the state metric must exist 
between ACS-z and all ACS circuits that correspond to states in 


Ao( 5,) U Ai(s t ) U • • • U A(£ / _ 1 )(s i ) 

The above set defines the connectivity of ACS-z in the ACS array corresponding to state 
■Si £ Sim- The connectivity of the ACS’s corresponding to states in the minimal trellis 
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results in a large amount of area in the VLSI chip being used for wiring [26, 27]. On the 
contrary, in the implementation of a Viterbi decoder based on the non-minimal trellis, the 
ACS circuits can be divided into blocks [31] such that the ACS’s corresponding to states 
in a single subtrellis form a block. A particular ACS-z needs to transfer its metric only to 
a subset of ACS’s within its own block. This reduced connectivity results in a reduction 
of hardware complexity and wiring area. The maximum connectivity of ACS-z is upper 
bounded by .SmaxxfC') in the non-minimal trellis implementation. 

4.5. Branch Complexity 

The number of distinct branch metrics that have to be computed in section-z of the 
trellis is a property of the code and is unaltered by the parallelization of the trellis. Most 
IC decoders have a branch metric computational unit where all the branch metrics are 
calculated and then transferred to the ACS circuits [26, 27]. Because of the interconnection 
of branches between states in the trellis, routing the branch metrics to each of the ACS 
circuits requires a large amount of chip area. The trellises we describe show improvement 
over the minimal L-section trellis on this count because each subtrellis requires only a subset 
of the set of branch metrics in section-z of the trellis. 

Parallelization of the minimal trellis as described in Section 2 may lead to a larger number 
of total computations being performed in decoding. The number of emanating branches in 
section-(/ 4- 1) is |5/Af|2 AiM which may be larger than the corresponding product for the 
minimal trellis for some values of /, 0 < / < L. However, as explained above, the hardware 
complexity of the decoder is not affected. We illustrate the reason with an example: The RM 
(64,42,8) code has a minimal trellis with the + sequence of {7, 13, 16, 16, 16, 16, 13, 7}. 
The same sequence for the non-minimal trellis is {13, 16, 16, 16, 16, 16, 16, 13} which is larger 
at positions (0,1,6, 7}. Consider the case when / = 1 (other cases are similar). In section 
2 of the minimal trellis, each of the 128 ACSs corresponding to states at the end of section 
1 has 64 branches emanating from it. In section 2 of the non-minimal trellis, each of the 
8192 ACSs has 8 branches emanating from it. Hence the number of operations performed 
per ACS are fewer in the non-minimal trellis. Hence larger values of |5fAf|2 A,M represent 
larger number of operations performed simultaneously in parallel by all the ACS’s in the 
non-minimal trellis. 

5. Examples 

Consider the (32,21,6) extended and permuted BCH code. The minimal 4-section, 
8-bits/section trellis has SCP {0,7,9, 7,0}. A non-minimal trellis 4-section trellis can be 
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obtained as the union of 32 parallel isomorphic subtrellises each having SCP {0,4, 4, 4,0}. 
Thus, Viterbi decoder implementations using the ACS-array architecture for both trellises 
will require 512 ACS circuits. However, in the minimal trellis, each ACS will require the 
capability of choosing the best among 64 incoming branches whereas the corresponding 
number is only 16 in the non-minimal trellis. The problem of routing state metrics is also 
much reduced since the connectivity of ACS-0 is 128 and that of ACS-* is at least 64 for 
1 - 1 - 511 while the maximum connectivity of any ACS in the non-minimal trellis is only 
16. The structural parameters of each of these trellises are summarized in Tables 2 and 3. 

Assuming each real number to be quantized to 8-bits the VLSI layouts of a radix-8 
ACS and a radix-16 ACS were generated. A modified form of the bit-level pipelined ACS 
architecture [33] was used for the ACSs. The area required for the radix-16 ACS was 2.7 times 
that required for the radix-8 ACS. Assuming a factor of 2.5 increase in area per doubling 
of the radix, we see that 12S ACSs in the minimal trellis have an area 6.25 times larger 
than their counterparts in the non-minimal trellis implementation. The remaining ACSs 
require the same area. We see that the device area is reduced by adopting the proposed 
trellis architecture. Furthermore, the reduction in ACS-connectivity will yield significant 
reduction in wiring area. The savings in hardware complexity and increase in speed due 
to the non-minimal trellis approach easily overcomes the extra cost of the final comparison 
among the 32 choices, (1 from each of the subtrellises) to find the best codeword. 

6. Trellises for a ( 64 , 40 , 8 ) subcode of RM ( 64 , 42 , 8 ) 

A ( 64 , 40 , 8 ) subcode of the Reed-Muller ( 64 , 42 , 8 ) code is proposed to NASA for usage 
as inner code in a concatenated coding system with the NASA standard (255,223,33) Reed- 
Solomon code as outer code [31}. This RM subcode achieves a 5.3 dB coding gain over 
uncoded BPSK at the bit error rate of 10 -6 . The required speed of decoding is 960 x 10 6 
BPSK symbols per second which translates to an information bit rate of 600 Mbps. The 
coding gain is 0.5 dB less than the coding gain of a similar scheme with the same outer 
code but the NASA standard rate- 1/2, 64-state convolutional code [34] as the inner code. 
However the (64,40) RM subcode has a higher rate of 0.626 bits/symbol than that of the 
convolutional code and thus requires lesser bandwidth. More significant is the fact that a 
Viterbi decoder for the (64,40,8) inner code can be designed to operate at higher data rates 
than that for the convolutional code using the parallelism of the trellis of the RM subcode. 

The trellis for the NASA standard 64-state convolutional code does not consist of parallel 
subtrellises. 

Let C denote the RM (64,42,8) code and C a (64,40) subcode of C. If the L-section 
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trellis on which decoding is based is composed of a union of P parallel isomorphic subtrellises 
then, the effective computational complexity denoted A e ff(L) is merely that of a single 
subtrellis plus the cost of obtaining the final decision by comparing outputs of each of the 
P Viterbi decoders. The value of L which minimizes A & ^(L) with the constraint that the 
L-section trellis T have a maximum state complexity not greater than 5 rnax x(C) (different 
from 5(C)) is determined. Note that s maSi ^(C) is a function of the choice of the subcode and 
we will choose that subcode which has the least s maXi £,(C) for each L. The complexity of 
each addition, subtraction and comparison is assumed to be equal to one addition equivalent 
operation. 

In the following, the trellis diagrams of various sectionalizations for this RM subcode 
are given. Their effective computational complexities are computed. 

6.1. L = 4, M — 16 

Let C 0 = (16,15,2), C! = (16,11,4) C 2 = (16,5,8) be the corresponding Reed- 
Muller codes, G, a generator matrix of C, and G,/j a generator matrix for the set of coset 
representatives [C,/C ; ]. Let x denote the Kronecker product. For L = 4, the RM (64,42) 
code has a minimal trellis corresponding to the 2-level squaring construction with a state 
complexity profile (SCP) {0.10,10,10,0} (-s max ^(C) = 10) and trellis oriented generator 
matrix 

G = ( 1 1 1 1 ) ® G 0/1 + 

l 1 0 0 \ 

0 110 ® G 1/2 T 

0 0 1 1 J 

(10 0 0 

0 10 0 

0 0 10 

v 0 0 0 1 

In order to obtain a (64,40) subcode C, one can delete any two of the 64 rows above giving 
a generator matrix for C. The maximum state space complexity s maXi4 (C) of the resulting 
code depends on which two rows we delete. It is easy to see that in order to have the 
least s max , 4 (C) which equals 8 we must delete any two of the 4 rows among (1111) ® G 0 /i 
obtaining an SCP of {0,8, 8, 8,0} (s max , 4 (C) = 4). Using the theory developed earlier, it 
can be seen that we can obtain at most 4 parallel subtrellises in any 4-section trellis for C 
without exceeding the allowable .s max>4 of 8. The effective computational complexity may be 
computed to give A^fj(A) = 39,682 addition equivalent operations for the 4-section trellis. 
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6.2. L = 8, M = 8 


Let C 0 — (8,8,1), Cj — (8,7,2) C 2 = (8,4,4) C 3 = (8,1,8) be Reed-Muller codes. 
For L = 8, the RM (64,42,8) code has a minimal trellis (with 2 parallel subtrellises) 
corresponding to the 3-level squaring construction with a SCP {0,7,10,13,10,13,10,7,0} 
(■Sm ax ,8(C) = 13) with trellis oriented generator matrix 

G = ( 1 1 1 1 1 1 1 1 rg ) 

1 1 1 1 0 0 0 0 rj\ 

0 1 0 1 1 0 1 0 r} 

0 0 1 1 1 1 0 0 rj 

\00001 1 1 1 r|; 

( \ 1 o o o o o o \ 

0 1 1 0 0 0 0 0 

0 0 1 1 0 0 0 0 

0 0 0 1 1 0 0 0 

0 0 0 0 1 1 0 0 

0 0 0 0 0 1 1 0 

v 0 0 0 0 0 0 1 1 j 

The (64,40) subcode C with the best SCP is obtained by deleting the rows r° 0 G 0/1 and 
any one among the three rows r} 0 G 1/2 . This code C has SCP {0,6,8,11,8,11,8,6,0} 
(• s max,8(C) = 11). Repeating a similar analysis, it is seen that one can obtain at most 32 
parallel subtrellises in any 8-section trellis for C without exceeding the maximum allowable 
state space complexity of s maXr8 (C) = 11. Each subtrellis has the SCP {0, 6, 6, 6, 3, 6, 6, 6, 0} 
and from knowledge of its trellis structure the effective complexity is A eff (8) = 12,822 
addition equivalent operations. 

6.3. L = 16, M = 4 

Let C 0 = Cj = (4,4,1), C 2 = (4,3,2) C 3 = (4,1,4) C 4 = (4,0, 00 ) be Reed-Muller 
codes. For L = 16, the RM (64,42,8) code has a trellis oriented generator matrix given by 

G — GptM( 16 t 5 ^) ® Gl/2 + ® G 2/3 + GftM(16,15,2) ® G 3 / 4 . (6.9) 

where G RM(n ^ >d) denotes a trellis oriented generator matrix for the (n,k,cl) Reed- 
Muller code. For L ~ 16, the RM (64,42,8) code has a minimal trellis (with 
no parallel subtrellises) corresponding to the 4-level squaring construction with a SCP 
{0, 4, 7, 10, 10, 13, 13, 13, 10, 13, 13, 13, 10, 10, 7, 4, 0} (5 maXt i6(C) = 13). The (64, 40) subcode 


® Go/i + 

® Gi/2 + 

® G2/3 4 * Is ® G 3 (6.8) 
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C with the best SCP is generated by G = G — {r} 0 G^, ® G^} where r} and r\ are the 

two rows with span [2, 15] and [3, 14] in the trellis oriented generator matrix for RM(16, 11,4). 
The SCP of the minimal trellis for C is {0,4,6,8,8,11,11,11,8,11,11,11,8,8,6,4,0} 
(smax ? i6(C) = 11). By analysis, one can obtain at most 8 parallel subtrellises in any 16- 
section trellis for C without exceeding the allowable s max ,i6(C) of 11. Each subtrellis has 
SCP {0, 4, 6, 8, 6, 8, 8, 8, 5, 8, 8, 8, 6, 8, 6, 4, 0}. The resulting effective computational complex- 
ity is i4 c //(16) = 23, 174 addition equivalent operations. 

6-4, L = 32, M = 2 and L = 64, M = 1 

When L = 32, s mAXy 32(C) = 12. The maximum number of parallel isomorphic sub- 
trellises possible without exceeding the allowable «s max# 32(C) — 12 in any 32-section trel- 
lis for the (64,40) subcode C is at most 4. So A e jj( 32) > 37,476. When L = 64, 
s m ax, 64(C) = 12. Furthermore, no parallel subtrellises are possible without exceeding the 
allowable s m&x ^ 4 (C) = 12. Hence A e /j(6 4) = 198,000. 

From the above analysis, we see that the 8-section trellis for the (64,40) RM subcode 
results in the least effective complexity. A VLSI implementation of a high-speed decoder 
for the (64,40) RM subcode is under way. The decoder is based on the 8-section trellis 
which is a union of 32 parallel isomorphic subtrellises with a maximum of 64-states each. A 
schematic of the subtrellis is shown in Figure 3. Note that the last 4 sections of the subtrellis 
form a mirror image of the first 4 sections. This structure allows us to perform bidirectional 
decoding from both ends of the subtrellis simultaneously [10, 31, 35]. Sections 1 through 4 
and sections 8 to 5 (in reverse order) are processed at the same time and path information 
corresponding to the most likely paths into the center 8 states which are the destination 
states are stored. The two path metrics (one from each side) at a center state are then 
added. This gives path metrics of 8 final survivors and the path with the largest path metric 
is the most likely path through the subtrellis. Since the resolution is done at the center of 
the subtrellis, the bottleneck of decoding caused by the large radix at the center states is 
avoided. This bidirectional decoding can be achieved by either using two identical subtrellis 
decoders working from both directions or using only one decoder to process the subtrellis in 
a concurrent bidirectional execution sequence as shown in Figure 4. The second approach 
simply exploits the use of pipelining in the ACS implementation and the mirror symmetry 
of the subtrellis about the center axis. The bidirectional decoding results in advantages in 
speed and implementation. A block diagram for the overall decoder is shown in Figure 5. We 
further note that sections 2, 3, 6 and 7 of each subtrellis decompose into 8 parallel, 8-state, 
fully connected isomorphic sub-subtrellises as depicted in Figure 3. This fact can be used to 
further reduce implementation complexity and increase the decoding speed. 
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7. Conclusion 


We have presented an approach for decomposing the minimal trellis of a binary linear 
block code into a non-minimal trellis composed of parallel components. This approach 
allows parallel processing of the subtrellises and does not increase the maximum number of 
states. Hence it has significant speed advantage. In addition, it also reduces the IC area 
requirements. Given a linear block code, we have estimated the limits to the benefits of 
this approach and its dependence on the uniform sectionalization of the trellis. The branch 
complexity of the non-minimal trellis relative to the minimal trellis can be larger in some 
sections. However, this does not increase the hardware complexity. Since the application of 
this method depends only on the generator matrix of the code, it can be applied to arbitrary 
linear block codes. 


Acknowledgement 

We are grateful to Dr. Toru Fujiwara of Osaka university for providing us with the 
generator matrices of some extended BCH codes with the best known order of bit positions 
with respect to trellis state complexity. We are extremely grateful to Cecilia W. Chu and Eric 
Nakamura of the University of Hawaii for helpful discussions relating to the VLSI aspects 
of this paper. We thank the reviewers for their many helpful suggestions and constructive 
criticism. 


References 

[1] L.E. Bahl, J. Cocke, F. Jelinek and J. Raviv, “Optimal Decoding of Linear Block Codes 
for Minimizing Symbol Error Rate,” IEEE Transactions on Information Theory, Vol. 
20, pp. 284-287, 1974. 

[2] J.K. Wolf, “Efficient maximum likelihood decoding on linear block codes using a trellis,” 
IEEE Transactions on Information Theory, Vol. 24, pp. 76-80, Jan. 1978. 

[3] J.L. Massey, “Foundation and methods of channel encoding,” Proc. Inti. Conf. Infor- 
mation Theory and Systems, NTG-Fachberichte, Berlin 1978. 

[4] G.D. Forney, Jr., “Coset codes - Part II: Binary lattices and related codes,” IEEE 
Transactions on Information Theory, Vol. 34, pp. 1152-1187, 1988. 

[5] D.J. Muder, “Minimal trellises for block codes,” IEEE Transactions on Information 
Theory, Vol. 34, pp. 1049-1053, Sept. 198S. 


18 



[6] Y. Berger and Y. Beery, "Bounds on the trellis size of linear block codes/’ IEEE 
Transactions on Information Theory, Vol. 39, 1993. 

[7] T. Kasami, T. Takata, T. Fujiwara and S. Lin, “On the optimum bit orders with respect 
to the state complexity of trellis Diagrams of binary linear codes/’ IEEE Transactions 
on Information Theory, Vol. 39, Jan. 1993. 

[8] T. Kasami, T. Takata, T. Fujiwara and S. Lin, “On complexity of trellis structure of 
linear block codes,” IEEE Transactions on Information Theory, Vol. 39, No. 3, pp. 
1057-1064, May 1993. 

[9] T. Kasami, T. Takata, T. Fujiwara and S. Lin, “ On structural complexity of the L- 
section minimal trellis diagrams for binary linear block codes,” IEICE Transactions 
on Fundamentals of Electronics, Communications and Computer Sciences, Vol. E76-A, 
No. 9, pp. 1411-1421, Sept. 1993. 

[10] T. Kasami, T. Takata, T. Fujiwara and S. Lin, “On branch labels of parallel compo- 
nents of the L-section minimal trellis diagrams for binary linear block codes,” IEICE 
Transactions on Fundamentals of Electronics, Communications and Computer Sciences, 
Vol. E77-A, No. 6, pp. 1058-1068, June 1994. 

[11] G.D. Forney, Jr. and M.D. Trott, “The Dynamics of Group Codes: State Spaces, Trellis 
Diagrams and Canonical Encoders,” IEEE Transactions on Information Theory, Vol. 
39, No. 5, pp. 1491-1513, Sept. 1993. 

[12] A. Vardy and Y. Be’ery, “Maximum Likelihood Soft-Decision Decoding of BCH Codes,” 
IEEE Transactions on Information Theory, Vol. 40, March 1994. 

[13] G.D. Forney, Jr., “Dimension/Length Profiles and Trellis Complexity of Linear Block 
Codes,” IEEE Transactions on Information Theory, Vol. 40, No. 6, pp. 1741-1751, Nov. 
1994. 

[14] G.D. Forney, Jr., “Dimension/Length Profiles and Trellis Complexity of Lattices,” 
IEEE Transactions on Information Theory, Vol. 40, No. 7, pp. 1753-1772, Nov. 1994. 

[15] G.D. Forney, Jr., “Trellises old and new,” Communications and Cryptography, Edited 
by R.E. Blahut and D.J. Costello, U. Maurer and T. Mittelholzer, pp. 115-128, Kluwer 
Academic Publishers, 1994. 

[16] Hari T. Moorthy and S. Lin, “On the Labeling of Minimal Trellises for Linear Block 
Codes,” Proceedings of the International Symposium on Information Theory and Its 
Applications 1994, Vol. 1, pp. 33-38, Institution of Engineers, Australia. 


19 



[17] A. Lafourcade and A. Vardy, “Asymptotically Good Codes have Infinite Trellis Com- 
plexity,” IEEE Transactions on Information Theory, Vol. 41, No. 2, pp. 555-559. March 
1995. 

[IS] O. Ytrehus, “On the trellis complexity of linear block codes,” IEEE Transactions on 
Information Theory, Vol. 41, No. 2, pp. 559-560, March 1995. 

[19] Y. Berger and Y. Be’ery, “Trellis-Oriented Decomposition and Trellis Complexity of 
Composite- Length Cyclic Codes,” IEEE Transactions on Information Theory, Vol. 41, 
No. 5, pp. 1185-1191, July 1995. 

[20] F.R. Kschischang, and V. Sorokine, “On the trellis structure of block codes,” IEEE 
Transactions on Information Theory, Vol. 41, No. 6, pp. 1924-1937, Nov. 1995. 

[21] A. Lafourcade and A. Vardy, “Lower bounds on trellis complexity of block codes,” 
IEEE Transactions on Information Theory, Vol. 41, No. 6, pp. 1924-1937, Nov. 1995. 

[22] A. Lafourcade and A. Vardy, "Optimal Sectionalization of a trellis,” submitted to IEEE 
Transactions on Information Theory, 1995, to appear. 

[23] R.J. McEliece, “On the BC-JR Trellis for Linear Block Codes,” submitted to IEEE 
Transactions on Information Theory, 1995. 

[24] T. Fujiwara, H. Yamamoto, T. Kasami and S. Lin, “A Recursive Maximum Likelihood 
Decoding Procedure for a Linear Block Code Using a Sectionalized Trellis Diagram and 
Its Optimization,” (invited paper) Proceedings of The Thirty-Third Annual Allerton 
Conference on Communication, Control and Computing, Allerton House, Monitcello, 
Illinois, Oct. 4-6, 1995, also submitted to IEEE Transactions on Information Theory, 
special issue on Codes and Complexity, 1995. 

[25] , T. Fujiwara, T. Kasami, R.M. Zaragoza and S. Lin, “The State Complexity of Trel- 
lis Diagrams for a Class of Generalized Concatenated Codes,” submitted to IEEE 
Transactions on Information Theory, 1994. (in revision). 

[26] P.J. Black and T.H. Meng, “A 140-Mb/s, 32-State, Radix-4 Viterbi Decoder,” IEEE 
Journal of Solid-State Circuits. Vol. 27, Dec. 1992. 

[2/] P.G. Gulak and T. Kailath, “Locally Connected VLSI Architectures for the Viterbi 
Algorithm,” IEEE Journal on Selected Areas in Communications, Vol. 6. pp. 526-537, 
April 1988. 


20 



[28] 0. M. Collins, “The Subtleties and Intricacies of Building a Constraint Length 15 
Convolutional Decoder,” IEEE Transactions on Communications, Vol. 40, No.12, pp. 
1810-1819, Dec. 1992. 

[29] B.S. Vishwanath, “Soft-Decision Viterbi Decoding of the (32,16) Reed-Muller Code 
and Its VLSI Implementation,” M.S. Thesis, Department of Electrical Engineeiing, 
University of Hawaii at Manoa, Aug. 1993. 

[30] G. Fettweis and H. Meyr, “Parallel Viterbi Algorithm Implementation: Breaking the 
ACS-Bottleneck,” IEEE Transactions on Communications, Vol. 37, pp. 785-789. Aug. 
1989. 

[31] S. Lin, G. T. Uehara, E. Nakamura and W. P. Chu, “Circuit Design Approaches for 
Implementation of a Subtrellis IC for Reed-Muller subcode ,” NASA Technical Report 
No. 96-001, February 1996. 

[32] H. Thapar and J. Cioffi, “A block processing method for designing high-speed Viterbi 
detectors,” Proceedings of the ICC, Vol. 2. pp. 1096-1100, June 1989. 

[33] A. K. Yeung and J. M. Rabaey, “A 210 Mb/s Radix-4 Bit-level Pipelined Viterbi 
Decoder,” ISSCC 1995 Digest of Technical Papers, San Francisco, CA Feb. 1995. 

[34] S. Lin and D.J. Costello, “Error Control Coding: Fundamentals and Applications,” 
Prentice-Hall, 1983. 

[35] M. Fossorier and S. Lin, “Coset Codes Viewed as Terminated Convolutional Codes,” 
submitted to IEEE Transactions on Communications, February 1995 (revised June 
1995). 


21 



Table 1: Set of row spans of trellis oriented generator matrix of (32,21,6) extended and 

permuted BCH code 


row-# 

span 

row-# 

span 

1 

[1,8] 

12 

[12,26] 

2 

[2,15] 

13 

[13,20] 

3 

[3,13] 

14 

[14,22] 

4 

[4,14] 

15 

[15,27] 

5 

[5,12] 

16 

[17,24] 

6 

[6,18] 

17 

[18,31] 

7 

[7.21] 

18 

[19,29] 

8 

[8.25] 

19 

[20,30] 

9 

[9,16] 

20 

[21,28] 

10 

[10,23] 

21 

[25,32] 

11 

[11,19] 




Table 2: Parameters of 4-section trellis of (32,21,6) extended and permuted BCH code 


1 

0 

1 

2 

3 

4 

SCP 

0 

7 

9 

7 

0 

CBP 

- 

0 

4 

6 

7 

EBP 

7 

6 

4 

0 

- 


1 =ACS-# 

Connectivity of ACS-z 

0 

128 

1 - 511 

64 


Table 3: Parameters of 4-section trellis of (32, 16) subcode of the (32,21,6) extended and 

permuted BCH code 


i 

0 

1 

2 

3 

4 

SCP 

0 

4 

4 

4 

0 

CBP 

- 

T 

4 

4 

4 

EBP 

4 

4 

4 

0 

- 


i =ACS-# 

Connectivity of ACS-z 

0-511 

16 


22 



























Table 4: Trellises for all RM and BCH codes of lengths 32, 64 


No of Sections L 

64 

32 

16 

8 

4 


1 

RM(32,6,16) 

P m *x,L(T) 


4 

4 

4 

3 

4 

•5ma x.L(C') 


1 

1 

1 

1 

0 


BCH(32,1 1,12) 

Rma x,l(T) 


9 

9 

9 

7 


^max,L(C ) 


1 

1 

1 

2 




-Bmax.L(^) 


5 

4 

5 

3 


^maxx(^ ) 


4 

4 

3 

3 


4 

BCH(32,21,6) 

B max ,L(T) 


9 

4 

4 

5 


^ma x,l(C ) 


8 

6 

6 

4 


5 

RM(32,26,4) 

P m &x,L{P) 


1 

1 

2 



^ma x,l(C ) 


4 

4 

3 



6 

RM(64,7,32) 

Pmzx,L(P) 

5 

5 

5 

5 

4 

5 

^ma x,l(^ ) 

1 

1 

1 

1 

1 

0 

7 

RM(64, 10,28) 

Pmax,L(P) 

10 

10 

10 

10 

10 

10 

^max,L(^ ) 

0 

0 

0 

0 

0 

0 

8 

BCH(64,16,24) 

Pm&x,L{P) 

14 

14 

i 

14 

12 

13 

14 


1 

1 

1 

2 

1 


9 

BCH(64,1S,22) 

Pm<xx,L,{T) 

16 

16 

16 

14 

16 

16 

^ma x,l(C ) 

1 

1 

1 

2 

2 

2 

10 

RM(64,22,16) 

Pm&X'LiP) 

9 

9 

8 

9 

6 


^max,L(^-'' ) 

5 

5 

5 

4 

4 


11 

BCH(64,24,16) 

Pmax.L(P) 

11 

11 

10 

11 

S 


^max,x(C ) 

5 

5 

5 

4 

4 


12 

BCH(64,30,14) 

Pm&x,L(P) 

15 

13 

14 

11 

14 


^max,L(^ ) 

6 

7 

6 

7 

4 


13 

BCH(64,36,12) 

Pmax,L(P) 

10 

9 

10 

9 

8 


^maxX(^ ) 

10 

10 

9 

8 

8 



1 


14 

BCH(64,39,10) 

P max,Z,(T) 

7 

8 

9 

10 

ii 


^ma x,l(^ ) 

13 

12 

11 

9 

8 


15 

RM(64,42,S) 

Pm&x,L{P) 

5 

6 

5 

7 



'Smax,£/(C' ) 

9 

8 

8 

6 


J 

16 

BCH(64,45,S) 

i 

B max x(T) 

2 

3 

4 

4 



•Smax.L(C') 

12 

11 

10 

9 



IB 



1 


1 

2 




MB 

■ 


n 

n 

ii 

10 



18 

i 

RM(64,57,4) 

Pmix,L(P) 

l 

i 

i 

2 



^max,//(^ ) 

5 

5 

5 

.. 

4 
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Figure 3: An 8-seclion, 63- slate sublrellis for the (64,35,8) subcode of the (64,40,8) RM 

subcode 



Sequence for Decoding lifr| e^ 

Sec. 1 | Sec. 8 Sec. 2 Sec. 7 Sec. 3 Sec. 6 Sec. 4 Sec. 5 Combine and Resolve 
Figure 4: Sequence for decoding using concurrent bi-directionai execution sequence 
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Figure 5: Block diagram of overall decoder with 32 Viterbi decoders 
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